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Abstract. Using tax and census data, we demonstrate that the distribution of individual income in the 
USA is exponential. Our calculated Lorenz curve without fitting parameters and Gini coefficient 1/2 agree 
well with the data. From the individual income distribution, we derive the distribution function of income 
for families with two earners and show that it also agrees well with the data. The family data for the period 
1947-1994 fit the Lorenz curve and Gini coefficient 3/8 = 0.375 calculated for two-earners families. 

PACS. 87.23. Ge Dynamics of social systems - 89.90.-(-n Other topics of general interest to physicists - 
02.50.-r Probability theory, stochastic processes, and statistics 



^ '. 1 Introduction 



The study of income distribution has a long history. Pareto 
||l[ proposed in 1897 that income distribution obeys a uni- 
versal power law valid for all times and countries. Sub- 
sequent studies have often disputed this conjecture. In 
1935, Shirras Q concluded: "There is indeed no Pareto 
Law. It is time it should be entirely discarded in studies 
on distribution" . Mandelbrot |^ proposed a "weak Pareto 
law" applicable only asymptotically to the high incomes. 
In such a form, Pareto's proposal is useless for describing 
the great majority of the population. 

Many other distributions of income were proposed: 
Levy, log-normal, Champernowne, Gamma, and two other 
forms by Pareto himself (see a systematic survey in the 
World Bank research publication Q). Theoretical jus- 
tifications for these proposals form two schools: socio- 
economic and statistical. The former appeals to economic, 
political, and demographic factors to explain the distri- 
bution of income (e. g. (|]), whereas the latter invokes 
stochastic processes. Gibrat proposed in 1931 that 
income is governed by a multiplicative random process, 
which results in a log- normal distribution (see also Q). 
However, Kalecki |^ pointed out that the width of this dis- 
tribution is not stationary, but increases in time. Levy and 
Solomon H] proposed a cut-off at lower incomes, which 
stabilizes the distribution to a power law. 

In this paper, we propose that the distribution of in- 
dividual income is given by an exponential function. This 
conjecture is inspired by our previous work [p^ , where 
we argued that the probability distribution of money in 
a closed system of agents is given by the exponential 
Boltzmann-Gibbs function, in analogy with the distribu- 
tion of energy in statistical physics. In Sec. B, we compare 



our proposal with the census and tax data for individual 
income in USA. In Sec. ^ we derive the distribution func- 
tion of income for families with two earners and compare 
it with the census data. The good agreement we found is 
discussed in Sec. ^. Speculations on the possible origins of 
the exponential distribution of income are given in Sec. 

2 Distribution of individual income 

We denote income by the letter r (for "revenue"). The 
probability distribution function of income, P{r), (called 
the probability density in book Q) is defined so that the 
fraction of individuals with income between r and r + dr 
is P[r)dr. This function is normalized to unity (100%): 
P{r)dr = 1. We propose that the probability distri- 
bution of individual income is exponential: 



(1) 
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where the subscript 1 indicates individuals. Function (|^) 
contains one parameter R, equal to the average income: 
Jo ^ ^1 '^'^ ~ ^1 ^'^^ analogous to temperature in the 
Boltzmann-Gibbs distribution [ p^ . 

From the Survey of Income and Program Participation 
(SIPP) we downloaded the variable TPTOINC (to- 
tal income of a person for a month) for the first "wave" 
(a four-month period) in 1996. Then we eliminated the 
entries with zero income, grouped the remaining entries 
into bins of the size 10/3 k$, counted the numbers of en- 
tries inside each bin, and normalized to the total num- 
ber of entries. The results are shown as the histogram in 
Fig. 1^, where the horizontal scale has been multiplied by 
12 to convert monthly income to an annual figure. The 
solid line represents a fit to the exponential function (pj). 
In the inset, plot A shows the same data with the loga- 
rithmic vertical scale. The data fall onto a straight line, 
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Individual annual income, k$ 



Fig. 1. Histogram: Probability distribution of individual in- 
come from the U.S. Census data for 1996 [0. Solid line: Fit 
to the exponential law. Inset plot A: The same with the log- 
arithmic vertical scale. Inset plot B: Cumulative probability 
distribution of individual income from PSID for 1992 |12|l. 
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Adjusted gross income, k$ 

Fig. 2. Points: Cumulative fraction of tax returns vs income 
from the IRS data for 1997 ^j. Solid line: Fit to the exponen- 
tial law. Inset plot A: The same with the logarithmic vertical 
scale. Inset plot B: Probability distribution of individual in- 
come from the IRS data for 1993 (li). 



whose slope gives the parameter R in Eq. (|l|). The expo- 
nential law is also often written with the bases 2 and 10: 
Pi(r) oc 2-''/-^2 oc lO-'^/'^i". The parameters R, i?2 and 
Rio are given in hne (c) of Table E. 

Plot B in the inset of Fig. ^ shows the data from 
the Panel Study of Income Dynamics (PSID) conducted 
by the Institute for Social Research of the University of 
Michigan |l|]. We downloaded the variable V30821 "Total 

1992 labor income" for individuals from the Final Release 

1993 and processed the data in a similar manner. Shown 
is the cumulative probability distribution of income N{r) 
(called the probability distribution in book It is de- 
fined as N{r) = Pir') dr' and gives the fraction of 
individuals with income greater than r. For the exponen- 
tial distribution (nj), the cumulative distribution is also 
exponential: A^i (rj = Pi{r')dr' = exp{—r/R). Thus, 
i?2 is the median income; 10% of population have income 
greater than Rio and only 1% greater than 2i?io. The 
points in the inset fall onto a straight line in the logarith- 
mic scale. The slope is given in line (a) of Table § 

The points in Fig. show the cumulative distribution 
of tax returns vs income in 1997 from column 1 of Table 
1 . 1 of Ref . jl3| . (We merged 1 k$ bins into 5 k$ bins in the 
interval 1-20 k$.) The solid line is a fit to the exponential 
law. Plot A in the inset of Fig. ^ shows the same data 





Source 




Year 


R ($) 


R2 ($) 


Rio ($) 


Set size 


a 


PSID Ji 




1992 


18,844 


13,062 


43,390 


1.39x10'' 


b 


IRS |l4 




1993 


19,686 


13,645 


45,329 


1.15x10* 


c 


SIPPp 


m 


1996 


20,286 


14,061 


46,710 


2.57x10^ 


d 


SIPPfJ 




1996 


23,242 


16,110 


53,517 


1.64x10^ 


e 


IRS p 




1997 


35,200 


24,399 


81,051 


1.22x10** 



Table 1. Parameters R, R2, and Rio obtained by fitting data 
from different sources to the exponential law (jl|) with the bases 
e, 2, and 10, and the sizes of the statistical data sets. 



with the logarithmic vertical scale. The slope is given in 
line (e) of Tabic 0. Plot B in the inset of Fig. || shows the 
distribution of individual income from tax returns in 1993 
|L4[ . The logarithmic slope is given in line (b) of Table 

While Figs. |l| and || clearly demostrate the fit of income 
distribution to the exponential form, they have the follow- 
ing drawback. Their horizontal axes extend to -l-oo, so the 
high-income data are left outside of the plots. The stan- 
dard way to represent the full range of data is the so-called 
Lorenz curve (for an introduction to the Lorenz curve and 
Gini coefficient, see book [Q). The horizontal axis of the 
Lorenz curve, x{r), represents the cumulative fraction of 
population with income below r, and the vertical axis y(r) 
represents the fraction of income this population accounts 
for: 



x{r) 



P{r')dr', y{r) 



Ior'P{r')dr' 
r'P(r') dr' 



(2) 



As r changes from to cx), a; and y change from to 1, and 
Eq. (^) parametrically defines a curve in the (a;, ?/)-space. 
Substituting Eq. (|l|) into Eq. (||), we find 

x{f) = 1 — exp(— f), y{f) — x(f) — f exp(— f), (3) 

where f = r/R. Excluding r, we find the explicit form of 
the Lorenz curve for the exponential distribution: 



y — X + {\ — x) ln(l — x). 



(4) 



R drops out, so Eq. (||) has no fitting parameters. 

The function (Q) is shown as the solid curve in Fig. ||. 
The straight diagonal line represents the Lorenz curve in 
the case where all population has equal income. Inequality 
of income distribution is measured by the Gini coefficient 
G, the ratio of the area between the diagonal and the 
Lorenz curve to the area of the triangle beneath the diag- 
onal: G — 2 Ji^{x — y) dx. The Gini coefficient is confined 
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Cumulative percent of tax returns 



Fig. 3. Solid curve: Lorenz plot for the exponential distribu- 
tion. Points: IRS data for 1979-1997 Inset points: Gini 
coefficient data from IRS |l5|. Inset line: The calculated value 
1/2 of the Gini coefficient for the exponential distribution. 

between (no inequality) and 1 (extreme inequality). By 
substituting Eq. (|j) into the integral, we find the Gini 
coefficient for the exponential distribution: Gi = 1/2. 

The points in Fi g. H represent the tax data during 
1979-1997 from Ref. (l||. With the progress of time, the 
Lorenz points shifted downward and the Gini coefficient 
increased from 0.47 to 0.56, which indicates increasing in- 
equality during this period. However, overall the Gini co- 
efficient is close to the value 0.5 calculated for the expo- 
nential distribution, as shown in the inset of Fig. ^. 

3 Income distribution for two-earners families 

Now let us discuss the distribution of income for families 
with two earners. The family income r is the sum of two 
individual incomes: r = ri +r2. Thus, the probability dis- 
tribution of the family income is given by the convolution 
of the individual probability distributions [W . If the latter 
are given by the exponential function (|l|), me two-earners 
probability distribution function P2 (r) is 

P2{r) - P,{r')P,ir - r') dr' - ■^e-^/''. (5) 

The function P2 (r) (H) differs from the function Pi (r) (Q) 
by the prefactor r/R, which reflects the phase space avail- 
able to compose a given total income out of two individual 
ones. It is shown as the solid curve in Fig. 0. Unlike Pi{r), 
which has a maximum at zero income, P-Kj") has a maxi- 
mum at r = i? and looks qualitatively similar to the family 
income distribution curves in literature 

From the same 1996 SIPP that we used in Sec. |11| , 
we downloaded the variable TFTOTINC (the total family 
income for a month), which we then multiplied by 12 to 
get annual income. Using the number of family members 
(the variable EFNP) and the number of children under 
18 (the variable RFNKIDS), we selected the families with 
two adults. Their distribution of family income is shown 
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Fig. 4. Histogram: Probability distribution of income for fam- 
ilies with two adults in 1996 Q. Solid line: Fit to Eq. (||). Inset 
histo gram : Probability distribution of income for all families in 
1996 |l[. Inset solid line: 0.45Pi (r) + 0.55P2 (r). 

by the histogram in Fig. ^ The fit to the function (||), 
shown by the solid line, gives the parameter R listed in 
line (d) of Table |l|. The families with two adults and more 
than two adults constitute 44% and 11% of all families in 
the studied set of data. The remaining 45% are the fam- 
ilies with one adult. Assuming that these two classes of 
families have two and one earners, we expect the income 
distribution for all families to be given by the superposi- 
tion of Eqs. (|l|) and (|): 0.45Pi(r) -f 0.55^2 (r). It is shown 
by the solid line in the inset of Fig. ^ (with R from line 
(d) of Table |^) with the all families data histogram. 

By substituting Eq. ^ into Eq. (||), we calculate the 
Lorenz curve for two-earners families: 

x{f) = 1 - (1 + f)e-^ y{f-) = x{f-) - f2g-f/2. (6) 

It is shown by the solid curve in Fig. ra. Given that 
X — y — "P exp(— f)/2 and dx — f exp(— f) or, the Gini co- 
efficient for two-earners families is: G2 = 2 (a; — y) dx ~ 
f3 gxp(-2r) dr = 3/8 = 0.375. The points in Fig. | 
show the Lorenz data and Gini coefficient for family in- 
come during 1947-1994 from Table 1 of Ref. @- The Gini 
coefficient is very close to the calculated value 0.375. 

4 Discussion 

Figs, d and || demonstrate that the exponential law (|^) fits 
the individual income distribution very well. The Lorenz 
data for the individual income follow Eq. (^ without fit- 
ting parameters, and the Gini coefficient is close to the 
calculated value 0.5 (Fig. ^). The distributions of the in- 
dividual and family income differ qualitatively. The for- 
mer monotonically increases toward the low end and has 
a maximum at zero income (Fig. |l|). The latter, typically 
being a sum of two individual incomes, has a maximum at 
a finite income and vanishes at zero (Fig. ^). Thus, the in- 
equality of the family income distribution is smaller. The 
Lorenz data for families follow the different Eq. (13) , again 
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10 20 30 40 50 60 70 80 90 100% 
Cumulative percent of families 

Fig. 5. Sohd curve: Lorenz plot ^ for distribution (P). Points: 
Census data for families, 1947-1994 ||l^. Inset points: Gini 
coefficient data for families from Census Inset line: The 
calculated value 3/8 of the Gini coefficient for distribution (H) . 



"without fitting parameters, and the Gini coefRcient is close 
to the smaller calculated value 0.375 (Fig. ||). Despite dif- 
ferent definitions of income by different agencies, the pa- 
rameters extracted from the fits (Table |]) are consistent, 
except for line (e). 

The qualitative difference between the individual and 
family income distributions was emphasized in Ref. 
which split up joint tax returns of families into individ- 
ual incomes and combined separately filed tax returns of 
married couples into family incomes. However, Refs. |Q 
and [ p^ counted only "individual tax returns" , which also 
include joint tax returns. Since only a fraction of families 
file jointly, we assume that the latter contribution is small 
enough not to distort the tax returns distribution from the 
individual income distribution significantly. Similarly, the 
definition of a family for the data shown in the inset of Fig. 
^ includes single adults and one-adult families with chil- 
dren, which constitute 35% and 10% of all families. The 
former category is excluded from the definition of a family 
for the data shown in Fig. || but the latter is included. 
Because the latter contribution is relatively small, we ex- 

to approximately represent 
Technically, even^for the 
we do 



pect the family data in Fig 
the two-earners distribution ( 5| 
families with two (or more) adults shown in Fig 
not know the exact number of earners. 

With all these complications, one should not expect 
perfect accuracy for our fits. There are deviations around 
zero income in Figs. ^ ||, and ^. The fits could be im- 
proved there by multiplying the exponential function by 
a polynomial. However, the data may not be accurate at 
the low end because of underreporting. For example, filing 
a tax return is not required for incomes below a certain 
threshold, which ranged in 1999 from $2,750 to $14,400 
jTsf . As the Lorenz curves in Figs. ||and|| show, there are 
also deviations at the high end, possibly where Pareto's 
power law is supposed to work. Nevertheless, the expo- 
nential law gives an overall good description of income 
distribution for the great majority of the population. 



5 Possible origins of exponential distribution 

The exponential Boltzmann-Gibbs distribution naturally 
applies to the quantities that obey a conservation law, 
such as energy or money |l0| . However, there is no fun- 
damental reason why the sum of incomes (unlike the sum 
of money) must be conserved. Indeed, income is a term 
in the time derivative of one's money balance (the other 
term is spending). Maybe incomes obey an approximate 
conservation law, or somehow the distribution of income 
is simply proportional to the distribution of money, which 
is exponential pO[ |. 

Another explanation involves hierarchy. Groups of peo- 
ple have leaders, which have leaders of a higher order, 
and so on. The number of people decreases geometrically 
(exponentially) with the hierarchical level. If individual 
income increases linearly with the hierarchical level, then 
the income distribution is exponential. However, if income 
increases multiplicatively, then the distribution follows a 
power law jl9j. For moderate incomes below $100,000, the 
linear increase may be more realistic. A similar scenario is 
the Bernoulli trials ||l^ , where individuals have a constant 
probability of increasing their income by a fixed amount. 

We are grateful to D. Jordan, M. Weber, and T. Petska for 
sending us the data from Refs. ||l^, [|l^, and [^5|, to T. Cran- 
shaw for discussion of income distribution in Britain, and to 
M. Gubrud for proofreading of the manuscript. 
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